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1 Introduction 

The study of stochastic combinatorial problems as well as Probabilistic Analysis of Algorithms are 
among the many subjects which use concentration inequalities. A central concentration inequality 
is the Hdffding-Azuma (H-A) inequality: For real- valued random variables X±,X2, ■ ■ ■ X n satisfying 
respectively absolute bounds and the Martingale (difference) condition: 

< 1 ; E(Xi\Xi,X2,...Xi-i) = 0, 

the H-A inequality asserts the following tail bound: Pr(EILiAj) — — cie _C2 * 2//n , for some 
constants ci, C2 (which are the tails of N(0, n), the standard normal density with variance n, but for 
constants.) Here, we present two theorems both of which considerably weaken the assumption of an 
absolute bound, as well as the Martingale condition, while retaining the strength of the conclusion. 
As consequences of our theorems, we derive new concentration results for many combinatorial 
problems. 

Our Theorem 1 is simply stated. It weakens the absolute bound of 1 on | Aj| to a weaker condition 
than a bound of 1 on some moments (upto the m th moment) of X{. It weakens the Martingale 
difference assumption to requiring that certain correlations be non-positive. The conclusion upper 
bounds E(^22=i Aj)" 1 (essentially) by the m th moment of N(0, n); it will be easy to get tail bounds 
from these moment bounds. Note that both the hypotheses and the conclusion involve bounds on 
moments upto the same m; so finite moments are sufficient to get some conclusions, unlike in H-A 
as well as Chernoff bounds in both of which, one uses the absolute bound to get a bound on E(e Xi ). 
Note that if A, have power law tails (with only finite moments bounded) , no automatic bound on 
E(e Xi ) is available. But, both H-A inequality and Chernoff bounds follow as very special cases of 
our Theorem 1. 

The study of the minimum length of a Hamilton tour through n random points chosen in i.i.d. 
trials from the uniform density in the unit square, was started by the seminal work of Bearwood, 
Halton and Hammersley |10J. The algorithmic question - of finding an approximately optimal 
Hamilton tour in this i.i.d. setting was tackled by Karp |32j - and his work not only pioneered the 
field of Probabilistic Analysis of Algorithms, but also inspired later TSP algorithms for deterministic 
inputs, like Arora's [7]. Earlier hard concentration results for the minimal length of a Hamilton tour 
in the i.i.d. case were made easy by Talagrand's inequality |43j . But all these concentration results 
for the Hamilton tour problem as well as many other combinatorial problems [41J make crucial use 
of the fact that the points are i.i.d. and so random variables like the number of points in a region 
in the unit square are very concentrated - have exponential tails. In the modern setting, heavier 
tailed distributions are of interest. There are many models of what "heavy-tailed" distribution 
should mean; this is not the subject of this paper. But as we will see, our theorems are amenable to 
"bursts in space" , where each region of space chooses (independently) the number of points that fall 
in it, but then may choose that many points possibly adversarially; further, the number of points 
may have power-law tails instead of exponential tails. In other problems, one may have "bursts 
in time", where, each time unit may choose from a power-law tailed distribution the number of 
arrivals/ new items/jobs. 

Using Theorem 1, we are able to prove as strong concentration as was known for the i.i.d. case 
of TSP (but for constants), but, now allowing bursts in space. We do the same for the minimum 



1 



weight spanning tree problem as well. We then consider random graphs where edge probabilities are 
not equal. We show a concentration result for the chromatic number (which has been well-studied 
under the traditional model with equal edge probabilities.) In these cases, we use the traditional 
Doob Martingale construction to first cast the problem as a Martingale problem. The moment 
conditions needed for the hypotheses of our theorems follow naturally. 

But an application where we do not have a Martingale, but do have the weaker hypothesis 
of Theorem 1 is when we pick a random vector (s) of unit length as in the well-known Johnson- 
Lindenstrauss (JL) Theorem on Random Projections. Using Theorem 1, we prove a more general 
theorem than JL where heavier-tailed distributions are allowed. 

A further weakening of the hypotheses of H-A is obtained in our Main Theorem - Theorem 
([7]) whose proof is more complicated. In Theorem Q, we use information on conditional moments 
of Xi conditioned on "typical values" of X\ + X% + . . ■ + -Xi-i as well as the "worst-case" values. 
This is very useful in many contexts as we show. Using Theorem 2, we settle the concentration 
question for (the discrete case of) the well studied stochastic bin-packing problem [T7J proving 
concentration results which we show are best possible. Here, we prove a bound on the variance 
of X{ using Linear Programming duality; we then exploit a feature of Theorem Q (which is also 
present in Theorem Q): higher moments have lower weight in our bounds, so for bin-packing, it 
turns out that higher moments don't need to be carefully bounded. This feature is also used for the 
next application which is the well-studied problem of proving concentration for the number X of 
triangles in the standard random graph G(n,p). While many papers have proved good tail bounds 
for large deviations, we prove here the first sub-Gaussian tail bounds for all values of p - namely 
that X has N(0, VarX) tails for deviations upto (np) 7 / 4 (see Definition ([!])). [Such sub-Gaussian 
bounds were partially known for the easy case when p > 1/y/n, but not for the harder case of 
smaller p.] We also give a proof of concentration for the longest increasing subsequence problem. 
It is hoped that the theorems here will provide a tool to deal with heavy-tailed distributions and 
inhomogeneity in other situations as well. 

There have been many sophisticated probability inequalities. Besides H-A (see McDiarmid |35j 
for many useful extensions) and Chernoff, Talagrand's inequality already referred to (|43j) has 
numerous applications. Burkholder's inequality for Martingales [15] and many later developments 
give bounds based on finite moments. A crucial point here is that unlike the other inequalities, 
different moments have different weights in the bounds (the second moment has the highest) and 
this helps get better tail bounds. We will discuss comparisons of our results with these earlier 
results in section 14 But one more note is timely here: many previous inequalities have also used 
Martingale bounds after excluding "atypical" cases. But usually, they insist on an absolute bound 
in the typical case, whereas, here we only insist on moment bounds. It is important to note that 
many (probably all) individual pieces of our approach have been used before; the contribution here 
is in carrying out the particular combination of them which is then able to prove results for a wide 
range applications. 



2 Theorem 1 

In theorem below, we weaken the absolute bound \X%\ < 1 of H-A to ([2]). Since this will be 
usually applied with n > m, ^ will be weaker than E(X\\X\ + X2 + . . . + Xi— 1) < 1 which is in 
turn weaker than the absolute bound - \X%\ < 1. We replace the Martingale difference condition 
E(Xi\Xi, X2, ■ ■ ■ Xi-i) = by the obviously weaker condition ([!]) which we will call strong negative 
correlation; it is only required for odd I which we see later relates to negative correlation. Also, we 
only require these conditions for all / upto a certain even m. We prove a bound on the (same) m 
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(which is even) th moment of Y^=i -^i- Thus, the higher the moment bounded by the hypothesis, 
the higher the moment bounded by the conclusion. This in particular will allow us to handle things 
like "power-law" tails. The following definition will be useful to describe tail bounds. 

Definition 1. Let a, a be positive reals. We say that a random variable X has N(0, a 2 ) tails upto 
a if there exist constants c, d such that for all t 6 [0, a], we have 



Pr{\X - EX\ >t)< c'e 



-ct 2 /a 2 



Here there is a hidden parameter n (which will be clear from the context) and the constants 
c, c' are independent of n, whereas a, a could depend on n. 

Theorem 1. Let X\, X 2 , . . . X n be real valued random variables and m an even positive integer 
satisfying the following for i = 1, 2, . . . 

EXi(Xi + X 2 + ... Xi^i) 1 < ,1 < m, odd. (1) 
E(X l i \X 1 + X 2 + ... + X i - 1 )< (^) ( ^ 2)/2 U,l<m, even. (2) 

Then, we have 

/ n \ m 

< (48nm) m / 2 . 
Xi has N(0, n) tails upto y/nm. 

i=i 

Remark 1. Since under the hypothesis of (H-A), |I]) and hold for all m, (H-A) follows from the 
last statement of the theorem. We will also show that Chernoff bounds follow as a simple corollary. 

Remark 2. Note that for the upper bound in |lp ; we have 
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The last quantity is an increasing function of I when n > m, which will hold in most applications. 
Thus the requirements on (E(X!-\Xi + X 2 + . . . + Xj-i)) 1 ^' are the "strongest" for I = 2 and the 
requirements get progressively "weaker" for higher moments. This will be useful, since, in appli- 
cations, it will be easier to bound the second moment than the higher ones. The same qualitative 
aspect also holds for the Main Theorem. 

Remark 3. Here, we give one comparison of Theorem |I]) with perhaps the closest result to it in the 
literature, namely a result proved by de la Pena ((1.7) of [18] - slightly specialized to our situation) 
which asserts: If X±,X 2 , ■ ■ ■ X n is a Martingale difference sequence with E(Xf \X±, X 2 , ■ ■ ■ , Xj-i) < 
2 for all i and E(Xj\Xi,X 2 , . . . ,Xi-\) < (Z!/2)a^' 2 ' 1 , for all positive even integers I, where 
a is some fixed real, then 



1 E will denote the expectation of the entire expression which follows. 
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It is easy to see that this implies N(0,n) tails upto n/y/a. 

Setting a = ^, the hypothesis of Theorem |7p implies I18\/ 's hypothesis upto I < m, not for all 
I as required there. Were we to be given this hypothesis for all I and furthermore assume X{ are 
Martingale differences (rather than the more general condition), then since njyfa. = \Jnm, we 
would get the same conclusion as Theorem |7]). ' L 18]'s result is stronger in other directions (which 
we won't discuss here), but, a main point of our theorem is to assume only finite moments since we 
would like to deal with long-tailed distributions. Further, note that we can apply our theorem with 
m = 0{\Jn), whence, allows moment bounds to grow with n unlike fTS^ . 

Proof Let Mi = MAX^ =1 E(X l i \X 1 + X 2 + . . . + X^i) for even I < m. For 1 < i < n and 
q £ {0, 2, 4, ... m — 2, m}, define 

/(*',?) = 

Using the two assumptions, we derive the following recursive inequality for f(n,m), which we will 
later solve (much as one does in a Dynamic Programming algorithm): 

f(n,m)<f(n-l,m) + Y Yl ^-M t f(n-l,m-t), (3) 

te{2,4,6,...m} 

Proof of Let A = X 1 + X 2 + ...X n _ 1 . Let ai = ^E\X n \ l \A\ m ~ l . Expanding (A + X n ) m , we 
get 

m 

E(A + X n ) m <EA m + mEX n A m - 1 + ( 4 ) 

1=2 

Now, we note that EX n A m ~ x < by hypothesis Q and so the second term may be dropped. [In 
fact, this would be the only use of the Martingale difference condition if we had assumed it; we 
use SNC instead, since it clearly suffices.] We will next bound the "odd terms" in terms of the two 
even terms on the two sides using a simple "log-convexity" of moments argument. For odd I > 3, 
we have 

1 /2 

E\X n \ l \A\ m ~ l < E (x^A^^X^A" 1 -^ 1 ) < (E(X l + 1 A m - l - 1 )) 1 l 2 (E(X l - 1 A m - l+l )) 1 / 2 
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Also, - < 



So, ai is at most 6/5 times the geometric mean of ai + \ and <z/_i and hence is at most 6/5 times 
their arithmetic mean. Plugging this into Q, we get 

- 11 
EQ2 x i) m <EA m + -(a 2 + a A + ... + a m ) (5) 

i=l 

Now, we use the standard trick of "integrating over" X n first and then over A (which is also crucial 
for proving H-A) to get for even I: EX l n A m ~ l = E A (A m - l E x (X l n \A)) < M\EA m ~ l which yields 

©• 

We view ([3]) as a recursive inequality for fin, m). We will use this same inequality for the proof 
of the Main theorem, but there we use an inductive proof; here, instead, we will now "unravel" 
the recursion to solve it. [Note that we cannot use induction since we only know the upper bound 
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Figure 1: Recursion Tree 



involving (n/m)^/ 2 ) -1 on the moments (as in the hypothesis of the theorem) and as n decreases 
for induction, this bound gets tighter.] Note that the dropping the EX n A m ~ l ensured that the 
coefficient of EA m is 1 instead of the 11/5 we have in front of the other terms. This is important: 
if we had 11/5 instead, since the term does not reduce m, but only n, we would get a (11/5)" when 
we unwind the recursion. This is no good; we can get m terms in the exponent in the final result, 
but not n. 

Imagine a directed graph (see figure Recursion Tree) constructed as follows: The graph has a 
root marked f(n,m). The root has (m/2) + 1 directed edges out of it going to (m/2) + 1 nodes 
marked (respectively) f(n — 1, m), f(n — 1, m — 2), . . . f(n— 1,0). The edges have weights associated 
with them which are (respectively) 1, ^^-M2, ^-^M^, . . . ^^j-M m . In general, a node of the 
directed graph marked f(i,q) (for i > 2, < q < m, even) has (q/2) + 1 edges going from it 
to nodes marked f(i — 1, q), f(i — 1, q — 2), . . . f{i — 1, 0); these edges have "weights" respectively 
1, ^||-M2, ^||-M4, . . . ^^M q which are respectively at most 

11 itt 2 , , 11 fn 4 „ * 11 m q „ ^ 

'•t 2r M2 'T4r M -"5^r M «' 

A node marked f(l,q) has one child - a leaf marked /(0,0) connected by an edge of weight M q . 
Define the weight of a path from a node to a leaf as the product of the weights of the edges along 
the path. It is easy to show by induction on the depth of a node that f(i,q) is the sum of weights 
of all paths from node marked f(i, q) to a leaf. [For example, if the assertion holds for all i < n, 
then Q implies that it holds for the root.] We do not formally prove this here. A similar (slightly 
more complicated) Lemma - Lemma 0- will be proved during the proof of the Main Theorem. 

Now, there is a 1-1 correspondence between paths from f(n,m) to a leaf and elements of the 
following set : L = {(h, 1%, ■ ■ ■ l n ) '■ k > 0, even ; Ya=1 ^ = m i' k indicates that at level i we take 
the U th edge - i.e., we go from node f(i, m — l n — l n -\ — . . . k+i) to f(i — 1, m — l n — l n -i — ■ ■ ■ k) 
on this path. For an I = (hyfa, • ■ ■ In) £ L and t £ {0, 2, 4, . . . m}, define 

gt(l) = number of i with U = t . 
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Clearly, the vector g(l) = (go(l) , 92Q) > • • • , 5m(0) belongs to the set 

H = {h= (h , h 2 , h^,... h m ) ;y th t = m; h t > 0; h t = n}. 

t t 

Since the weight of an edge corresponding to U at any level is at most (y^M^jf, where z = 1 iff 
li > 2, and the number of non-zero Zj along any path is at most m/2, we have 

11 .„ „ ,,s m tgt{i) 



/(n,m)<^(-r/ 2 n^ 



il 5' 11 « (*!)*<*> 

For an h £ H, the number of I G L with <ft(Z) = fatVt is the number of ways of picking subsets of 
the n variables of cardinalities ho, h 2 , h&, . . . h m , namely, 

n \ n \ n h,2+h4,+...h m 



< 

h ,h 2 ,hi, . . .h m J ho\h 2 \h±\ . . . h m \ h 2 \h^\ • • • h m ] - 
Thus, we have (using the assumed upper bound on conditional moments) 

fi n m ) < V - T\m tht - 

n ' j " l 5 j ^ h 2 \h A \...h m \^ m m M(t/2)-i) 

h&H t 
11 rn h 2 +h4+...+h 

< (y , mir /^|MAX ft6g '^ 4 , (0) 

using Stirling inequality for factorial. Now we will show that the maximum is attained when 
h 2 = m/2 and the other ht are all zero. In what follows t only ranges over values > 2 for which 
h t ^0. 

n % - rp- (if * * n «* (1 + (i - 0)"* <*• £ - (s> + " - * + H 

using 1 + x < e x for all real x. Now, the function ^ t (™ + /i^ lni) (considered as a function of the 
ht) is linear and so its maximum over the simplex - h > 0; £\ i/it = m - is attained at an extreme 
point. Hence £ t (t + ^ ln < MAX * (f + 1ZL r 1 ) • Now considered as a function of t, f + ^ 
is decreasing, so the maximum of this over our range is at t = 2. Thus, we have 

n e ^<(2e r / 2 . (7) 

Now, we bound \H\: each element of H corresponds to a unique y-vector {h 2 , 2h±, 4/tg, . . .) with 
coordinates summing to m/2. Thus |iJ| is at most the number of partitions of m/2 into m/2 parts 
which is (^2) — ^ n Plugging this and (7) into (6), we get the moment bound in the theorem. 
The bound on m th moment of Xiin the theorem will be used in a standard fashion to get 

tail bounds. For any t, by Markov inequality, we get from the theorem Pr(| Y^i X{\ > t) < ( 24 "^) 1 _ 
The right hand side is minimized at m = t 2 /{en). So since the hypothesis of the theorem holds for 
this m, we get the claimed tail bounds. 

□ 

The following Corollary is a strengthening of Chernoff Bounds. 
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Corollary 2. Suppose X\,X 2 , ■ ■ ■ ,X n are real valued random variables, a a positive real and m 
an even positive integer such that 

E{Xf\X x + X 2 + ... + X^ X ) <a 2 fork even , k < m 

EX i (X 1 +X 2 + ... + X i - 1 ) k <0 fork odd, k < m. 

Then, E{Y^ =l Xi) m < (c 2 nma 2 ) m l 2 and YJl=\ X i has N(0,na 2 ) tails upto Min(na 2 , y/mna) . 

Proof Let t £ [0, Min(ncr 2 , yjmna)\. We will apply the theorem with m equal to the 
even integer nearest to t 2 /{c\no~ 2 ) for a suitable c\ > 2. Since t < no~ 2 , it is easy to see that 
a 2 < a k {n/m)^ k / 2 ^ 1 for any even k, so the hypothesis of the theorem applies to the set of random 
variables - (Xi/cr), (X 2 /a), . . . (X n /a). So from the theorem, we get that 



E{YXi) m < {c 2 nraa 2 ) m l 2 



i=i 



and so by Markov, we get 



pr(i£*d >*)<(=) 



2\ m /2 



i=l 



Now choose c suitably so that C2n ^ a < I and we get the Corollary. 

Remark 4. The set-up for Chernoff bounds is: X±,X 2 , . . . X n are i.i.d. Bernoulli random variables 
with EXi = v. For any t < nv Chernoff bounds assert: Pr{\y^_AXj — u)\ > t) < e~ ct /( nu \ We 
get this from the Corollary applied to X{ — v, since E(Xi - v) 2 < v and since \Xi — v\ < 1, higher 
even moments of X{ — v are at most the second moment. So, the hypothesis of the Corollary hold 
with a 2 = v and we can apply it. 

The general Chernoff bounds deal with the case when the Bernoulli trials are independent, but 
not identical - EXi may be different for different i. This unfortunately is one of the points this 
simple theorem cannot deal with. However, the Main Theorem does deal with it and we can derive 
the general Chernoff bounds as a simple corollary of that theorem - see Remark 



3 Functions of independent random variables 

Theorem[T]and the Main Theorem ([7]) will often be applied to a real- valued function f{Y\,Y 2 , . . . Y n ) 
of independent (not necessarily real- valued) random variables Y\ , Y 2 , . . . to show concentration of 
/. This is usually done using the Doob's Martingale construction which we recall in this section. 
While there is no new stuff in this section, we will introduce notation used throughout the paper. 

Let Yi,Y 2 , . . . Y n be independent random variables. Denote Y = (Yi,Y 2 , . . . Y n ). Let f(Y) be a 
real-valued function of Y. One defines the classical Doob's Martingale: 

X % = E(f\Y u Y 2 , ...Y t )- E(f\Y u Y 2 , . . . Y^). 

It is a standard fact that the X{ form a Martingale difference sequence and so is satisfied. We 
will use the short-hand E % f to denote E(f\Y\, Y 2 , . . . Yi), so 

X t = E l f - E l ^f. 

Let Y W denote the n — 1-tuple of random variables Y\,Y 2 , . . . Y%-\, li+i, . • • Y n and suppose 
/(yW) is also defined. Let 

A, = f(Y) - /(y«). 
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Then, X t = E l A t - E Yi (E i A i ) , (8) 

since FW does not involve Y{. f,Yi,Xi,Ai will all be reserved for these quantities throughout the 
paper. We use c to denote a generic constant which can have different values. 

4 Random TSP with Inhomogeneous, heavy-tailed distributions 

One of the earliest problems to be studied under Probabilistic Analysis [UJ is the concentration of 
the length / of the shortest Hamilton cycle through a set of n points picked uniformly independently 
at random from a unit square. Similarly, Karp's algorithm for the problem |32j was one of the 
earliest polynomial time algorithms for the random variant of a problem which is NP-hard in the 
worst-case; see also [32]. It is known that Ef £ 0(y / n) and that / has iV(0, 1) tails. This was proved 
after many earlier steps by Rhee and Talagrand [38J and Talagrand's inequality yielded a simpler 
proof of this. But Talagrand's method works only for independent points; under independence, 
the number of points in any sub-region of the unit square follows Poisson distribution which has 
exponentially falling tails. Here, we will give a simple self-contained proof of the concentration 
result for more general distributions (of number of points in sub-regions) than the Poisson. Two 
important points of our more general distribution are 

• Inhomogeneity (some areas of the unit square having greater probability than others) is 
allowed. 

• heavier tails (for example with power-law distributions) than the Poisson are allowed. 

We divide the unit square into n small squares, each of side 1/y/n. We will generate at random a 
set Yi of points in the i th small square, for i = 1, 2, ... n. We assume that the \Yi\ are independent, 
but not necessarily identical random variables. Once the \Y{\ are chosen, the actual sets Yi can be 
chosen in any (possibly dependent) manner (subject to the cardinalities being what was already 
chosen.) This thus allows for collusion where points in a small square can choose to bunch together 
or be spread out in any way. 

Theorem 3. Suppose there is a fixed c\ 6 (0, 1), an even positive integer m < n, and an e > 0, 
such that for 1 < i < n and 1 < I < m/2, 

Pri\Yi\ = 0) < Cl ; E\Y\ l <(0(l))^ 1 . 

Suppose f = f(Y\, Yz, . . . Y n ) is the length of the shortest Hamilton tour through Y\ U Yi U . . . Y n . 
Then, f has iV(0, 1) tails upto \fm. 

Remark 5. If each Yi is generated according to a Poisson of intensity 1 (=Area of small square 
times n), then E\Yi\ l < l l and so the conditions of the theorem are satisfied for all m (with room 
to spare). 

Remark 6. Note that if the hypothesis hold only upto a certain m, we get normal tails upto \/m. 
So for example \Yi\ can have power law tails and we still get a result, whereas the older results 
require exponential tails. 

Proof Order the small squares in ^/n layers - the first layer consists of all squares touching 
the bottom or left boundary; the second layer consists of all squares which are 1 square away from 
the bottom and left boundary etc. until the last layer is the top right square (order within each layer 
is arbitrary.) Fix an i. Let Si be the i th square. Let r = r(3^+i, . . . Y n ) be the minimum distance 
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from a point of Si to a point in Y, + i U . . . Y n and tq = Min(r, 2y2). tq depends only on l^+i, . . . Y n . 
(So, E l To = Etq.) We wish to bound Aj = f(Y) — f(Y^) (see notation in section For this, 
suppose we had a tour T through VW. We can break this tour at a point in Yi + i U 5^+2 U Y n (if it 
is not empty) closest to Si, detour to Si, do a tour of Yi and then return to T. If ii+l U 1^+2 U 
is empty, we just break T at any point and do a detour through Yi. So, we have 

Aj < to + dist. from a point in Sj to a point in Yi + length of tour thro' Yi + tq 
< 2r + 0(1/Vn) + /(^)- 



Since Aj > 0, we get using Qfor any even /: 

- E Yi {E l Ai) <Xi< E l Ai < 2£r + 0(1/Vn) + /(Y) =>- < c'(£V o y + + -^E^l 2 

(9) 

where the last step uses the following well-known fact |41j . 



Claim 1. For any square B of side a in the plane and any set of s points in B, there is a Hamilton 
tour through the points of length at most ca^fs. 

First focus on i < n — 100 Inn. We will see that we can get a good bound on Etq for these i. 
For any A 6 [0, 5v / h7n/y / n], there is a square region T\ of side A inside Si + \ . . . S n (indeed, inside 
the later layers) which touches S^ So, Pr(r > ^2A) < Pr(T A n (Y i+1 U . . . Y n ) = 0) < e~ cnX2 by the 
hypothesis that Pr(|Yj| = 0) < c\ < 1. This implies that 

Et q < Pr (t > 5\/rnn/v / n) (2\/2) + E(t\t < h^hin/y/n) 



< 4 + ( E(t 2 \t < 5vdn^/Vn)V /2 < 4= + ( H Xe~ cnX2 ) ^ < -^=. 
\/n V / Jn V./n / \/n 



Plugging this and the fact tha^that E\Yi\ 1 / 2 < (0(/))( 2 " e )^/ 2 ) < (0(l)) 1 into (Jo), we get E i ~ 1 X l i < 

n l/2- 



We now apply theorem ( 1 ) to c$y/nXi, for i = 1, 2, ... n — 100 In n to get 




< (cm)™/ 2 . (10) 



Now, we consider i > n — 100 In n + 1. All of these squares are inside a square of side \Anrt/ \fn. So, 
we have | £[U_ 1001nn+1 X,| < 2^+ -^^^ ^, Now using E (E^-xooinn+i I^D m/2 < 
c(lnn) m//2 m m_em , we get (]C™ = i ^j) m — (cm) m//2 which by the usual argument via Markov in- 
equality, yields the tail bounds asserted. □ 



5 Minimum Weight Spanning tree 

This problem is tackled similarly to the TSP in the previous section. We will get the same result as 
Talagrand's inequality is able to derive, the proof is more or less the same as our proof for the TSP, 
except that there is an added complication because adding points does not necessarily increase the 
weight of the minimum spanning tree. The standard example is when we already have the vertices 
of an equilateral triangle and add the center to it. 
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Theorem 4. Under the same hypotheses and notation as in Theorem Q), suppose f = f(Y±,Y2, ■ ■ . Y n 
is the length of the minimum weight spanning tree on Y% UY2 U . . . Y n . f has N(0, 1) tails upto \/m. 



Proof If we already have a MWST for Y \Yi, we can again connect the point in Yj-fi, ■ ■ - Y n 
closest to Si to Si, then add on a MWST on Y{ to get a spanning tree on Y . This implies again 



that Aj < ro H — r£ . But now, we could have f(Y) < f(Y). We show that 

Claim 2. Ai^-doro-^p. 

Proof We may assume that Yi / 0. Consider the MWST T of Y. We call an edge 
of the form (x,y) £ T : x £ Yi,y £ Y \ Yi, with \x — y\ > c$/yjn, a long edge and an edge 
(x,y) £ T : x £ Yi,y £ Y \ Yi, with \x — y\ < cg/y/n a short edge. It is well-known that the degree 
of each vertex in T is O(l) (we prove a more complicated result in the next para), so there are at 
most 6\Yi\ short edges; we remove all of them and add a MWST on the non-Y; ends of them. Since 
the edges are short, the non-1^ ends all lie in a square of side 0(l/y/n), so a MWST on them is of 
length at most 0(y/\Yi\/y/n) by Claim Q. 

We claim that there are at most 0(1) long edges - indeed if (x, y), (w, z) are any two long edges 
with x, w £ Yi, we have \y — z\ > \x — y\ — since otherwise, (T \ (x, y)) U (y, z) U (x, w) would 

contain a better spanning tree than T. Similarly, \y — z\ > \w — z\ — Let xq be the center of 

square Si. The above implies that in the triangle xo, y, z, we have \y—z\ > \xq— y\ — y=,\xo — z\ — f=. 

But \y — z\ 2 = \y — xo\ 2 + \z — xo\ 2 — 2\y — xq\\z — xq\ cos(y, xq, z). Assume without loss of generality 
that \y — xq\ > \z — xq\. If the angle y,xo,z were less than 10 degrees, then we would have 
\y — z\ 2 < \y — xq\ 2 + \z — xo\ 2 — 1.8\y — xq\\z — xq\ < (\y — xq\ — 0A\z — xq\) 2 a contradiction. So, we 
must have that the angle is at least 10 degrees which implies that there are at most 36 long edges. 

Let a be the point in Yi+i, . . . Y n closest to Si if Yi+x U ... U Y n is non-empty; otherwise, let a 
be the point in Y\ U Yi U . . . Yi-\ closest to Si. We finally replace each long edge (x, y), x £ Yi by 
edge (a,y). This clearly only costs us 0{tq) extra, proving the claim. 

Now the proof of the theorem is completed analogously to the TSP. □ 



6 Chromatic Number of inhomogeneous random graphs 

Martingale inequalities have been used in different (beautiful) ways on the chromatic number % 
of an (ordinary) random graph G(n,p), where each edge is chosen independently to be in with 
probability p (see for example |3S],PI],P2!,|22],P3,0,|2]). 

Here we study chromatic number in a more general model. An inhomogeneous random graph 
- denoted G{n,P) - has vertex set [n] and a n x n matrix P = {pij} where pij is the probability 
that edge (i,j) is in the graph. Edges are in/out independently. Let 




be the average edge probability. Let x = x{G{n, P) be the chromatic number. Since each node can 
change the chromatic number by at most 1, it is trivial to see that Pr(|x — E\\ > t) < c\e~ C2t ' n 
by H-A. Here we prove the first non-trivial result, which is stronger than the trivial one when the 
graph is sparse, i.e., when p £ o(l). 

Theorem 5. % °f G{n,P) has N(0,nlnn^/p) tails upto n^fp. 
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Remark 7. Given only p, note that \ could be as high as fi(ny^) : for example, p^ could be 0(1) 
for i,j G T for some T with \T\ = 0(n^/p) and zero elsewhere. 

Proof Let pi = Y^jPij be the expected degree of i. Let 

S = {i : Pi > n^/p}. 

\S\ < 2n- s fp. Split the n — |5| vertices of [n] \ S into k = (n — \S\)y/p groups Gi, G2, ■ ■ ■ Gk by 
picking for each vertex a group uniformly at random independent of other vertices. It follows by 
routine application of Chernoff bounds that with probability at least 1/2, we have : (i) for each i, 
the sum of Pij,j G (same group as i) < O(lnn) and (ii) \Gt\ G 0(lnn/ \/p) for all t. We choose any 
partition of [n] \S into G\, G2, ■ ■ ■ G k satisfying (i) and (ii) at the outset and fix this partition. Then 
we make the random choices to choose G(n,P). We put the vertices of S into singleton groups - 

Cfe+l) • • ■ ^k+\S\' 

Define Fj for i = 1, 2, . . . k + \S\ as the set of edges (of G(n, P)) in G { x [G x U G 2 U . . . G^ x ). We 
can define the Doob's Martingale X{ = E(x\Yi, Y2, ■ ■ - Yi) — E(x\Yi, Y2, . . ■ li-i). First consider i = 
1, 2, ... k. Define Aj as in section [3j Let dj be the degree of vertex j in Gi in the graph induced on Gi 
alone. Aj is at most max^g^ dj+1, since we can always color Gi with this many additional colors, dj 
is the sum of independent Bernoulli random variables with Edj = YlieG Pjt — O(lnn). By Remark 
~ , we have that E{dj - Edj) 1 < MAX((c! Inn) 1 / 2 , (cl) 1 ). Hence, E i_1 (A') < (cl) 1 + (dlnn)'/ 2 . 
We will apply Theorem ([T]) to the sum 

C7X1 c 7 X 2 c 7 X k 
In n In n In n 

It follows from the above that these satisfy the hypothesis of the Theorem provided m < k . From 
this, we get that 



(k \ 
^XA < (cmk In n) m/2 . 



For i = k + 1, . . . k + \S\, Aj are absolutely bounded by 1, so by the Theorem E(Xk+i + X k +2 + 
...X k+lsl r<(c\S\m) m / 2 . Thus, 

fk+\S\ \ m 
E \Y1 Xi \ - (cmklnn) m/2 . 

Let t G (0, n^fp). We take m = the even integer nearest to t 2 j (c^n^fpXnn) to get the theorem. 

□ 



7 Random Projections 

A famous theorem of Johnson-Lindenstrauss [S] asserts that if v is picked uniformly at random 
from the surface of the unit ball in R n , then for k < n, and e G (0, 1), 2 £)i=i vf has iV(0, ^) tails 
upto I- 

The original proof exploits the details of the uniform density and simpler later proofs ( [H] , |2U] , 
|25j ) use the Gaussian in the equivalent way of picking v. Here, we will prove the same conclusion 



2 A clearly equivalent statement talks about the length of the projection of a fixed unit length vector onto a random 
k— dimensional sub-space. 
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under weaker hypotheses which allows again longer tails (and so does not use any special property 
of the uniform or the Gaussian). This is the first application which uses the Strong Negative 
Correlation condition rather than the Martingale Difference condition. 

Theorem 6. Suppose Y = (Yi, Y2, . . . Y n ) is a random vector picked from a distribution such that 
(for a k < n) (i) E(Y?\Y 2 + Y 2 2 + . . . Y?_ x ) is a non-increasing function of Y 2 + Y 2 2 + • • • Y?L\ f or 
i = 1,2,... k and (ii) for even I < k, E(Yj\Y? + Y 2 2 + ... Y?_ x ) < (elf 2 /n 1 / 2 . Then, J2i=i Y i has 
N(0,£) tails upto |. 

Proof The theorem will be applied with X{ = Y 2 — EY 2 . First, (i) implies for odd 
I: EXi(X\ + X2 + . ..Xi-i) 1 < 0, by (an elementary version) of say, the FKG inequality. [If 
X\ + X2 + . . . Xi-i = W, then since W l is an increasing function of W for odd I and E(Xi\W) 
a non-increasing function of W, we have EX t W l = E w (E(Xi\W)W l ) < E w {E{Xi\W))EW l = 
EX t EW l = 0.] Now, for even I, E^ 1 ^) < 2 l EY 21 + 2 l (EY 2 ) 1 < (cl) l /n l . So we may apply 
the theorem to the scaled variables c-jnXi, for % = 1,2, ... k for m < k to get that w5^i=i X% has 
N(Q, k) tails upto 0(\fkk) = O(k). So, X^Li X i has N (°i ^) tails u P to 0(k/n 2 ) as claimed. □ 

Question A common use of J-L is the following: suppose we have iV vectors v 1, V2, ■ ■ ■ vn is R n , 
where n, are high. We wish to project the vt to a space of dimension k « n and still preserve all 
distances \vi — Vj\. Clearly, J-L guarantees that for one Vi — Vj , if we pick a random k dimensional 
space, its length is more or less preserved (within a scaling factor). Since the tail probabilities 
fall off exponentially in k, it suffices to take k a polynomial in log A to ensure all distances are 
preserved. In this setting, it is useful to find more general choices of random subspaces (instead of 
picking them uniformly at random from all subspaces) and there has been some work on this ([8], 
p] , [3] ) . The question is whether Theorem 1 here or the Main Theorem can be used to derive more 
general results. 



8 Main Probability Inequality 

Now, we come to the main theorem. We will again assume Strong Negative Correlation ([!]) of the 
real- valued random variables X\,X2, ■ ■ ■ X n . The first main point of departure from Theorem ([T]) is 
that we allow different variables to have different bounds on conditional moments. A more impor- 
tant point will be that we will use information on conditional moments conditioned on "typical" 
values of previous variables as well as the pessimistic "worst-case" values. More specifically, we 
assume the following bounds on moments for i = 1, 2, ... n (m again is an even positive integer): 

E(Xl\Xi + X 2 + . . < M a for I = 2,4,6,8. . . m. (11) 

In some cases, the bound Mn may be very high for the "worst-case" X\ + X2 + . . . Aj_i. We will 
exploit the fact that for a "typical" X\ + X2 + . . . Aj_i, E(X\\X\ + X2 + ... Aj_i) may be much 
smaller. To this end, suppose 

£i t i , I = 2, 4, 6, . . . m ; i = 1, 2, . . . n 
are events. & n is to represent the "typical" case. E\i will be the whole sample space. In addition 



to (11), we assume that 



E{X\\X X + X 2 + . . . X^Sn) < L a (12) 

Pr(&,i) = 1 - <5 iiZ for I = 2, 4, 6, 8 . . . m (13) 
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Two quantities play a role in the theorem. The first is the "average typical I th moment" L[ which 
we define as 

1 n 

Li = ^yLu for I = 2,4,6,8. . .m. 

i=l 

The second has to do with worst-case moments, but modulated by Sn- Let 

Note that while M% \ may be very large, one can make Mn smaller by controlling <5j /. 

Theorem 7 (Main Theorem). Let X\, X2, ■ ■ ■ X n be real valued random variables satisfying Strong 
Negative Correlation and m be a positive even integer and Li,Mu,5u be as above. Then for 

EX ~ <_ (cm) m (E ^f] " n + (mr £ _L £ Ka )" /a ■ 

\l=l v 7 / Z=l i=l 

Besides the distinction between typical case and worst-case conditional moments which we 
already mentioned, a second feature of the Theorem is similar to Theorem ([!]) in that the second 
moment term will often be the important one. The L term on the right hand side of the theorem 
is at most 



cm) m/2 (nL 2 + VnmL 1 / 2 + 



m/2 



where we note that for m « n, (which is the usual parameter setting with which the theorem will 
be applied) the coefficients of higher moments decline fast, so that under reasonable conditions, the 
nL/2 term is what matters. In this case, it will not be difficult to see that we get iV(0, nL 2 ) tails, as 
we would in the ideal case when X{ are independent and in the limit X\ + X 2 + . . . + X n behaves 
like the normal (with variance equal to sum of the variances of the Xi, namely nL 2 ). 

Remark 8. The general Chernoff bounds are a very special case: suppose Xi,i = 1,2, ...n are 
independent Bernoulli trials with EXi = i/j. We will apply the theorem to bound the m th moment 
of X = — vi) and from that the tail probability. It is easy to see that E{Xi — i>i) 1 < v% for 

all even I, so we may take Li j2 i = v% to satisfy the hypothesis of the Theorem for every m. Let 
J2t ^ = v. We get 

( i/i\ m/2 
EX m <(crnr /2 \mJ2(l/l^j . 

The maximum of (y /m) l l l occurs at I = 1 if v > m and at I = m/2 otherwise; in any case, it is at 
most 1 + (y/m) and so we get (using ^2i(l/l 2 ) < 4j for any t > 0, 

cmy + m ) ] 

Now putting m = 2 {y+t) > we — — e^ ct2 ^ 2 ^ u+t ^\ which are Chernoff bounds. 
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9 Proof of the Main Theorem 

[The proof is complicated, not for lack of efforts on the part of the author. While certainly some 
of the intricate use of inequalities to get things to the final form which is usable may be necessary, 
it is possible that the reader may be luckier in simplifying the proof.] 

We will use induction on n,m. At a general step of the argument, we will need to bound 
E (^21=1 Xi) 9 , where, r < n and q < m, even. To bound this, let A = X\ + X2 + . . . + X r -\. 
Binomial expansion gives us 

E<^2 x iY = E ( A + x r) q = EA q + qEX r A q ~ l + f q J EX l r A q ~ l . 
i=l 1=2 ^ ' 

The second term is non-positive by hypothesis. Also arguing exactly as in the proof of theorem 
@, for odd I > 3, 

EX\.A q - 1 < -{EX l r +l A q - 1 - 1 + EX!r 1 A q - l+1 ), 
5 

and so we get 

/ r \ 1 / \ 

" " '" (14) 



E <EA q + 3j2 (f) EX l r A q - 1 . 



I even 

Without confusion, we will use £ r \ to mean the 0-1 indicator variable of the event (defined earlier) 
£ r \. Then, for even I > 2, we get 

EX\A q - 1 = EX l r A q - l £ rl + EX l r A q -\l - £ rl ) < L ri EA q ~ l + M rl EA q ~\l - £ rl ) 

q-l 

< L rl EA q ~ l + M ri [EA q - l+2 ^j q ~ l+2 (E(l - £ r i))^+* Holder 

< L rl EA q ~ l + M rl bY^ (EA q ~ l+2 ) q ~ W since m > q 



< L rl EA q - 1 + {£Q q2 - l+2) {3m 2 n 2 ' l )^^j 



( (g-0(i-2) 1 

M r l {9 ~ l+2) ( EA q ~ l+2 ) 



V 



(3m 2 n 2 / ! )H+s 



< L rl EA^ + Mf (3m 2 n 2 ^ + M \ , 

where, in the last step, we have used Young's inequality which says that for any a, b > real and 
s,r > with i + - = 1, we have ab < a s + b r ; we have applied this with s = (q — I + 2)/2 and 
r = (q-l + 2)/(q-l). 

Plugging this into (14), we get: 

(r \i g j — l 
E*« M E^(E^r'> 
i=l J l>0 i=l 

even 
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$ = i + 3-Vr'i. /-« 



3m 2 n \ 2 



3[ , )L rl + 3( , * „ ) Affir m „ 2 < J < (? - 2 



- 3L rg + 3 E ( ? )^(' 1 (3m 2 )^-' l) /V9- Il )/ ,1 ) Z = g. 



even 

It is easy to see that 



4? < «w = 1 + 1 , I = 

" n 



$ < = 3( 7 ) (X* + M^ 2) ^ 2/( ' +2) ) - ' 1 



3^ + 3 E 



< a rq = 3L rq + 3 y * M^(3m 2 )^)/2 n ( 3 -^)A. 



even 

It is important to make not be much greater than 1 because in this case only n is reduced and 
so in the recurrence, this could happen n times. Note that except for I = q, the other a r i do not 
depend upon q; we have used a rq to indicate that this extra dependence. With this, we have 

(r \ 1 q-2 r-l 

1=1 J l>0 i=l 

even 

We wish to solve these recurrences by induction on r, q. Intuitively, we can imagine a directed 
graph with root marked (n,m). The root has y + 1 children which are marked (n — l,m — I) 
for I = 0, 2, . . . m; the node marked (r, q) is trying to bound E(J2l = i Xi) q . There are also weights 
on the edges of a r i. The graph keeps going until we reach the leaves - which are marked (1, *) or 
(r, 0) . This is very similar to the recursion tree picture accompanying the proof of Theorem ([I]) . 
It is intuitively easy to argue that the bound we are seeking at the root is the sum over all paths 
from the root to the leaves of the product of the edge weights on the path. We formalize this in a 
lemma. 

For doing that, for 1 < r < n;2 < q < m, q even and 1 < i < r define S(r, q, i) as the set of 
s = (si, Sj+i, Sj+2, ...s r ) with si > 0; Sj+i, Si+2, . . . s r > and Y7j=i s j = T, 8j even. 

Lemma 1. For any 1 < r < n and any q < m even, we have 



a j,Sj ■ 



£(5»'<£ E n 

i=l i=l sES(r,q,i) .7=1+1 

Proof Indeed, the statement is easy to prove for the base case of the induction - r = 1 since 
£\i is the whole sample space and EXf < L\ q . For the inductive step, we proceed as follows. 

r q-2 r-l 

1=1 s r >0 i = \ 

even 



) — 1 q—2 r—l 

< a r ,g + E E ar ^ E ^ II 

i=l s r >o seS(r—l,q—s r ,i) j=i+l 

even v ' ' 



a 3,Sj ■ 
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We clearly have S(m, q, m) = {q} and for each fixed i, 1 < i < r — 1, there is a 1-1 map 
S(r — 1, q, i) U S(r — 1, g — 2, i) U . . . S(i — 1, 2, i) — > S(r, q, i) given by 

s = (s^ Si + i, . . . s r _i) — >■ s' = (si, . . . s r _i, g — X^^i 1 s j) an d it is easy to see from this that we have 
the inductive step, finishing the proof of the Lemma. □ 

The "sum of products" form in the lemma is not so convenient to work with. We will now 
get this to the "sum of moments" form stated in the Theorem. This will require a series of 
(mainly algebraic) manipulations with ample use of Young's inequality, the inequality asserting 
[ai + d2 + • • ■ a r ) q < r q ~ l (a\ + a\ + . . . af) for positive reals a\, a,2, ■ ■ ■ and q > 1 and others. 

So far, we have (moving the I = terms separately in the first step) 

(n \ m / n \ n n 

i=l / \i=l / i=l seS(n,m,») 



^ 3 E e ^ n 

i=l seS(n,m,i) J=i+i 

<3E E ( 15 ) 

t>l \i=l / sGQ(m-2t) 3=1 

where, Q(g) = {s = (si, s 2 , . . . s n ) : s» > even ; Sj = q} 

j 

Fix q for now. For s G Q(<?)J = 0, 1,2, . ..q/2, let !}(*) = {j : sj = 21} and t,(a) = |Ti(*)|. Note 

that Y2l=o tti{ s ) = q/2- Call i(s) = (to(s), ii(s), ^(s), • . . t q /2(s)) the "signature" of s. In the special 
case when an is independent of i, the signature clearly determines the "s term" in the sum (15). 
For the general case too, it will be useful to group terms by their signature. Let (the set of possible 



signatures) be T. [T consists of all t = (to, t±,t2, . . ■ tq/2) with ti > J2l=i Ml = ?/2 ; Yli=o U 

n Ti partition [n] q/2 

Now, n a M = e e n n °^ 

seQ(g) teT ro,Ti,T2,...T„/ 2 :|T;[=tj Z=l ieT; 

9/2 / n \*i 

teT 1=1 L ' \i=l / 

since the expansion of a i,2if l contains ^! copies of FIieT; a «-2Z ( as we U other terms we do not 

need.) Now define R = {r = (n, T2, ■ • • ^5/2) : r z > 0; E^ n = ?/2}. We have 



n. 



9/2 / n \ *i 1 « 

teT 1=1 1 \i=l / rei? I K 11 ' i=l 

9/2 



^(e^-(e-) 1 ") 



where the first inequality is seen by substituting r\ = til and noting that the terms corresponding 
to the r such that l\r{il are sufficient to cover the previous expression and the other terms are 
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non-negative. To see the second inequality, we just expand the last expression and note that the 
expansion contains IIjQ^ a i,2lf 1 ^ with coefficient ( ri r ^ 2 r a ) for each r 6 R. Now, it only remains 

to see that m n(1 ~ (1/ ' );i > 



(n/iy. 



which is obvious. Thus, we have plugging in (16) into (15), (for 



some constant c > 0; recall c may stand for different constants at different points): 



EX m < c n 




a i,2l 



m i m 



in i m 



Tfi TTl 

Now, (— - t)\ > (— - t)T-' e -T e t > mT _t e"^Min t 



t\ 2" 



in 



>m2 r (2e) 2 



the last using Calculus to differentiate the log of the expression with respect to t to see that the 
min is at t = 0. Thus, 



EX m < c m J2 



3 

m 



^m 1 T (^a Ji2/ 



=1 



,i=i 



Let a, /3 denote the quantities in the 2 square brackets respectively. Young's inequality gives us: : 
af3 < a m /(™-2<) + pm/2t_ Thu ^ 



-i 



i=l \ i / 



+e "Me 



In what follows, let Zi run over even values to m and i run from 1 to n. 



(17) 



EE** ^ cm E E L 



i.2t 



t=l \ i 



+ c m m m 



^-1 ( n 



, n * — ' < — ' \hm 

t \ i h<2t v 



2t 



{nM l>h ) 2t ' h < 



^EE^^+^E^IE^) 2 ^] 

t % t,h 1 \ i / 

< c ™ EE + cV E 4r E^^r 7 ' 1 . 



(18) 
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(using t m l 2t <c m .) 



1=1 



E 

1=1 



_i / m 

m 1 



21 \ i 



(20! 



i 

JW » 2Z+2 



i --i 



m 



m 



E 

i=i 



m i 



i 

f+i 



i 



Z=l \ i / «=2 v y 

We will further bound the last term using Holder's inequality: 

m 



2/ 



/=2 



(Z-l) 2 



' oo \ (m-2)/2 | 



(.2/ 



(19) 



(20) 



Now plugging ( |19|18|20[ ) into |l7| and noting that cm 2 "^ /I 2 > 1, we get the Theorem. □ 



10 Bin Packing 

Now we tackle bin packing. The input consists of n i.i.d. items - Y%, Yz, ■ ■ ■ Y n £ (0, 1). Suppose 
EYi = [i and VarYi = a. Let / = f(Yi, Y2, ■ ■ ■ Y n ) be the minimum number of capacity 1 bins into 
which the items Yx,Yz, . . .Y n can be packed. It was shown (after many successive developments) 
using non-trivial bin-packing theory ([37]) that / has N(0,n(fi 2 + a 2 )) tails upto 0(n(iJ? + a 2 )). 
Talagrand [13] gives a simple proof of this from his inequality (this is the first of the six or so 
examples in his paper.) [We can also give a simple proof of this from our theorem.] 

Talagrand [13] says (in our notation) "especially when fi is small, one expects that the behavior 
of / resembles the behavior of X/iLi^i- Thereby, one should expect that / should have tails of 
N(0,na 2 ) or, at least, less ambitiously, iV(0,n(/x 2 + <t 2 ))". 

However, N(0,na 2 ) (as for sums of independent random variables) is easily seen to be impossi- 
ble. An example is when items are of size 1/k or (1/k) + e (k a positive integer and e << 1/k is a 
positive real) with probability 1/2 each, a is 0(e). It is clear that the number n\ of 1/k items can 
be in ^ ± Q(^/(n)). Now, a bin can have at most k — 1 items if it has any (1/k) + e item; it can 
have k items if they are all 1/k. Thus if n\ number of 1/k items, we get 

ni n-ni =-(- + J-) ± ^ 

1 k^k-1^ K ) 2\k ^ k-l) k 2 
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From this it can be seen that the standard deviation of / is £l(y/n~fi 2 ) >> \fno~, establishing what 
we want. 

Here we prove the best possible interval of concentration when the items take on only one of a 
fixed finite set of values (discrete distributions - a case which has received much attention in the 
literature for example |19j and references therein). [While our proof of the upper bound here is 
only for problems with a fixed finite number of types, it would be nice to extend this to continuous 
distributions.] 

Theorem 8. Suppose Y\,Y2, ■ ■ -Y n are i.i.d. drawn from a discrete distribution with r atoms, 
r G O(l), each with probability at least j^^;- Let EY\ = /j, < r2 ^ - and VarY; = a 2 . Then for any 
t G (0, n(^ 3 + a 2 )), we have 

Pr{\f -Ef\>t + r)< Cie - rf2 /M^ 3 +^)). 
Further, there is distribution for Y{ in which Var(/) G il(n(/x 3 + o~ 2 )). 

Proof Let item sizes be £1, (2, • • • Cj ■ ■ ■ Cr and the probability of picking type j be pj. [We 
will reserve j to denote the j th item size.] We have : mean fj, = ^jPjCj an d standard deviation 

- (>;,/';(v 2 . 

Note that if fj, < r/y/n, then earlier results already give concentration in an interval of length 
0{y/n{[i + a) which is then 0(r + cr), so there is nothing to prove. So assume that /i > r/y/n. 

Define a "bin Type" as an r— vector of non- negative integers specifying number of items of 
each type which are together packable into one bin. If bin type i packs aij items of type j for 
j = 1, 2, . . . r we have ^ - aijQ < 1. Note that s, the number of bin types depends only on Q, not 
on n. 

For any set of given items, we may write a Linear Programming relaxation of the bin packing 
problem whose answers are within additive error r of the integer solution. If there are n,- items of 
size Cj in the set, the Linear program, which we call "Primal" (since later we will take its dual) is : 

Primal : {xi number of bins of type i.) 

s s 

Min Xj subject to ^^Xjajj > rijVj ; Xj > 0. 
i=i i=i 

Since an optimal basic feasible solution has at most r non-zero variables, we may just round these 
r up to integers to get an integer solution; thus the additive error is at most r as claimed. In what 
follows, we prove concentration not for the integer program's value, but for the value of the Linear 
Program. The Linear Program has the following dual : 

r 

MAX n jUj s -t- ^2 ai jyj — 1 f° r * = lj 2, . . . s; yj > 0. 

j=i j 

(yj may be interpreted as the "imputed" size of item j) Let Y = (Y\, Y2, . ■ ■ , Y n ) and (for an i we 
fix attention on) Y' = (Yi, Y2, ■ ■ ■ Yi-%, li+i, . . . Y n ). We denote by f(Y) the value of the Linear 
Program for the set of items Y. Let Aj = f(Y) — f(Y'). The typical events Si will just be that the 
number of copies of each Q among Y±, Y2, ■ ■ ■ , is close to its expectation: 



Si : Vjj I no. of copies of Q in Y±, Y2, ■ ■ . , Yi-\ — (i — l)pj\ < 100 \/m In pj(i — 1), 
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where m is to be specified later, but will satisfy ^n(/U 3 + a 2 ). We will use the Theorem with this 
parameter m. 

We will make crucial use of the fact that second moments count highly for the bound in the the- 
orem. So the main technical part of the proof is the following Lemma bounding typical conditional 
second moments. 

Lemma 2. Under £ i} Var(Aj|Yi, Y 2 , . . . , Yj_i) G 0(fi 3 + a 2 ). 

Proof Suppose now, we have already chosen all but Y{. Now, we pick Yi at random; say 
Yi = Cfc. Let Y = (Yi,Y 2 , . . . Y n ) and Y' = (Y 1 ,Y 2 , . . . Y^,Y i+1 , . . . Y n ) 
Let 

A, = f(Y) - f(Y>). 

Suppose we have the optimal solution of the LP for Y'. There is a bin type which packs |l/CfcJ 
copies of item of type k; let iq be the index of this bin type. Clearly if we increase Xi by m^rj , we 
get a feasible solution to the new primal LP for Y. So < Aj < jjj^j < Ck + 2Cf , which implies 

E(A 2 \Y>) < Y.I'M, + 2C|) 2 < »sj + 8j><J 
j j j 

< i? + a 2 + 8 x 8 J^pj|Cj - /u| 3 + 8 x 8^p^ 3 < ,u 2 + 65cr 2 + 64^ 3 . (21) 

Now, we lower bound Aj by looking at the dual. For this, let y be the dual optimal solution for 
Y' . (Note : Thus, y = y(Y') is a function of Y' .) y is feasible to the new dual LP too (after adding 
in Yi), since the dual constraints do not change. So, we get:Aj > y^. 

E(A,|y')>E^( y ')- (22) 

j 

Also, recalling the bin type io defined earlier, we see that yu < VL(VCfc)J — Cfc + 2£?. Say the 
number of items of type j in Y' is (n — l)pj + 7j. It is easy to see that ( is a feasible dual solution. 
Since y is an optimal solution, we have 

J^((n - l)pj + 7j)j/j > X](( n - + Ti)Ci- 
j j 



J 3 \ j j 3 

<^(E«/w)) 1/2 (Ew(w-o) 2 ) 1/2 



< 32 ^^ MAX j | 7j /^|, (23) 

where we have used the fact that —Q < yj — Q < 2C 2 < 2Q. Let (i — l)pj + jj and (n — i)p.,- + 7" 
respectively be the number of items of size Q among Yi,Y 2 ,.. . Yj_i and lj+i, . . . Y n . Since 7" is the 
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sum of n — i i.i.d. random variables, each taking on value — pj with probability 1 — pj and 1 — pj 
with probability pj, we have Ei^y 1 -) 2 = Var(7") < npj. Now, we wish to bound the conditional 
moment of conditioned on Y\,Y2,.. • ^i-i- But under the worst-conditioning, this can be very 
high. [For example, all fractions upto i — 1 could be of the same type.] Here we exploit the typical 
case conditioning. The expected number of "successes" in the i — 1 Bernoulli trials is pj{i — 1). By 
using Chernoff, we get (recall the definition of £i) Pr(-i£j) = ( say )5i < /u 4m m _4m . Using (22) and 
(23), we get 

E(A i \Y 1 ,Y 2 ,...Yl-i;£ i ) 

> 11 - 32( ^ + ff)r £(majc — (W0Jmln(10m/fi) Pj (i - 1) + (£( 7 ") 2 ) 1/2 )) 

n j ^Jp] V 

> 11 - c(n + cr) 5/2 r v / ln(10m///) - > fj, - 0(/i 2 ), 



using m < ^n(/Lt 3 + a 2 ). So, we get recalling (21), 

Var(A i |y 1 ,y 2 , . ..y<_i;£i) = £(A 2 |y l5 Y 2 , . ..Y^-i;^) - (^(A^!,^, . .. Y^;^)) 2 < c(// 3 + a 2 ), 

using ^ < /i < r2 ^ g - . This completes the proof of the Lemma. □ 
Now, we have for the worst-case conditioning, 

Var(Aj|Yi,y 2 , . . .Yj_i) < E(A 2 \Yi,Y 2 , . . . Yj_i) < c/i 2 . 

We now appeal to ([8]) to see that these also give upper bounds on Var(Xj). As promised, dealing 
with higher moments is easy: note that |Aj| < 1 implies that Li 21 < £i,2- Now to apply the 
Theorem, we have Lj 2 z < c(/U 3 + a 2 ). So the "L terms" are bounded as follows : 

m/2 . . 3 2 . 1// 




I < era (// + cr 2 ) 



noting that m < n(^ 3 + cr 2 ) implies that the maximum of ((n/m)(/i 3 +0" 2 ) 1 /' is attained at / = 1 and 
also that ^2i{l/l 2 ) < 2. Now, we work on the M terms in the Theorem, maxj 5i < /x 4m m _4m = 5* 
(say). 

m/2 n m/2 

^(l/nl^K.^^e^, 

Z=l i=l 1=1 

where = § log n + , (m _^ +2) log£*. We have fc'(i) = -^logn- log «5 %^J 2 f + 2 2) 2 ■ Thus for 
/ > (m/4) + (1/2), /i'(Z) < and so h(l) is decreasing. Now for I < (m/4) + (1/2), we have 
^logn > —(log 5*) plm-21+2)' 1 ' so a S am ^'(0 ^ 0- Thus, /i(/) attains its maximum at I = 1, 
so (36m) m+2 5X /2 eM0 ^ rai(36m) m+3 ra m / 2 £* giving us (36m) m+2 ^^(nM* ) m / 2 ' < (mm(/i 3 + 
<7 2 )) m / 2 . Thus we get from the Main Theorem that E(f — Ef) m < (cmra(/x 3 + o" 2 ))~2~, from which 
Theorem (8) follows by the choice of m = L C5n ^3 +0 .2) J • 

10.1 Lower Bound on Spread for Bin Packing 

This section proves the last statement in the theorem. Suppose the distribution is : 
T . . . k — 1 \ k — 2 / l\ 1 
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This is a "perfectly packable distribution" (well-studied class of special distributions) (k — 2 of the 
large items and 1 of the small one pack.) Also, a is small. But we can have number of 1/k items 
equal to ^ - cyf. Number of bins required > X t = £ + ^ + (i - l)) > ^. 

So at least bins contain only (k — l)/k(k — 2) sized items (the big items). The gap in each 

such bin is at least 1/k for a total gap of f^y^n/fc 3 / 2 ). On the other hand, if the number of small 
items is at least re/ (A; — 1), then each bin except two is perfectly finable. 

11 Longest Increasing Subsequence 

Let Yi, Y 2 , ■ ■ -Y n be i.i.d., each distributed uniformly in [0, 1]. We consider here f(Y) = the length 
of the longest increasing subsequence (LIS) of Y. This is a well-studied problem. It is known that 
Ef = (2 + o(l))y / re (see for example [5]). Since changing one Yi changes / by at most 1, traditional 
H-A yields (0, re) tails which is not so interesting. Frieze [23J gave a clever argument (using a 
technique Steele [5T] calls "flipping") to show concentration in intervals of length re 1 / 3 . Talagrand 
(13] gave the first (very simple) proof of iV(0, ^/n) tails. Here, we also supply a (fairly simple) proof 
from Theorem ^ of A(0, ^/n) tails. [But by now better intervals of concentration, namely 0(re 1//6 ) 
are known, using detailed arguments specific to this problem [9].] Our argument follows from two 
claims below. Call Yi essential for Y if Yi belongs to every LIS of Y (equivalently, f(Y \ Yi) = 
f(Y) - 1.) Fix Y U Y 2 ,... and for j > i, let aj = Pr (Yj is essential for Y\Y U Y 2 , . . . Fj_i) 

Claim 3. a.;, aj+i, . . . a n form a non- decreasing sequence. 

Proof Let j > i. Consider a point u in the sample space where Yj is essential, but Yj + \ is 
not. Map oj onto u' by swapping the values of Yj and Yj+i; this is clearly a 1-1 measure preserving 
map. If is a LIS of to with j G 9, j + 1 ^ 9, then 9 \ j U j + 1 is an increasing sequence in u'; 
so /(^') > /(^)- If /(^') = f(u) + 1, then an LIS a of oj' must contain both j and j + 1 and 
so contains no such that is between Yj,Yj + \. Now a \ j is an LIS of oj contradicting the 
assumption that j is essential for oj. So f{oj') = f(oj). So, j + 1 is essential for u/ and j is not. So, 

Claim 4. < cjyjn — i + 1. 

Proof aj < X)j>i Q i- Now J2j>i a j = a ( sa y) i s the expected number of essential 

elements among Yi, . . . Y n which is clearly at most Ef(Yi, Yi+i, . . . Y n ) < c\/n — T+T, so the claim 
follows. □ 

Aj is a 0-1 random variable with E(Ai\Yi, Y 2 , . . . K;_i) < c/^n — i + 1. Thus it follows (using 
(|8j) of section ^) that 

E(X?\Y 1 ,Y 2 ,...Y i - 1 )<c/y/n-i + l. 

Clearly, E(X\\Y X , Y 2 , . . . < E(X?|Yi, y 2 , . . . 3$_i) for Z > 2, even. Thus we may apply the 

main Theorem with En equal to the whole sample space. Assuming p < y/n, we see that (using 
£,(1/Z 2 ) = O(l)) 

E(/ - Eff < (c lP )^ +2 n p /\ 
from which one can derive the asserted sub-Gaussian bounds. 

12 Number of Triangles in a random graph 

Let / = f(G(n,p)) be the number of triangles in the random graph G(n,p), where each edge is 
independently put in with probability p. There has been much work on the concentration of /. 
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[33] . [35] discuss in detail why Talagrand's inequality cannot prove good concentration when p the 
edge probability is o(l). [But we assume that np > 1, so that Ef = 0(n 3 p 3 ) is S7(l).] It is known 
(by a simple calculation - see [26J ) that 

Var/ = 0(MAX(nV,nV))- 

Our main result here is that / has N(0, Var/) tails upto 0*((np) 7 / 4 ), where, as usual, the * hides 
log factors. By a simple example, we see that / does not have N(0, Var/) tails beyond (np) 9 / 4 . 
We note that our result is the first sub-Gaussian tail bound (with the correct variance) for the case 
when p < 1/y/n. [For the easier case when p = n~ a ,a < 1/2, such a tail bound was known [45] , 
but only upto (np) e for a small e > 0.] 

The most popular question about concentration of / has been to prove upper bounds on 
Pr (/>(! + t)Ef) for essentially e E r2(l) (see [33], [57]), i.e., for deviations as large as Q(Ef). In 
a culmination of this line of work, [28j have proved that 

Pr(/ > (l + e)Ef) < ce- ce2nV . 

This is a special case of their theorem on the number of copies of any fixed graph in G n>p . Their 
main focus is large deviations, but for general t, putting e = t/n 3 p 3 would only give us e~ 1 ' ! /( ra4 P 4 ). 
Also, [33j develops a concentration inequality specially for polynomial functions of independent 
bounded random variables and [IS] develops and surveys many applications of this inequalities; 
[45] discusses the concentration of the number of triangles as the "principal example" . 

Theorem 9. / has N(0, Var/) tails upto 0*((np) 7 / 4 ). 

Proof Let Yi be the set of neighbors of vertex i among [i — 1] and imagine adding the Yi in 
order. [This is often called the vertex-exposure Martingale.] We will also let Y^ be the 0-1 variable 
denoting whether there is an edge between i and j for j < i. The number of triangles / can be 
written as / = ^ i>j>k YijY jk Y ik . 

As usual consider the Doob Martingale difference sequence 

X t = E(f\Yt,Y 2 , ...Y^- E{f\Y^Y 2 , . . . 

It is easy to see that 

X * = YI Y jk{YijY ik - p 2 ) + (n- i)p 2 Y( Y v -P) = X ^ + X H ( sa y)- 

j<fcg[i-l] j<i 

Let E l denote E(-\Y\, Y%, . . . li_i). We will be applying our main concentration inequality Theorem 
|7| with to = 0(t 2 /Var/). Let q be any even integer between 2 and to. E^Xf) < 2iE i (Xf 1 ) + 
2 q E l (X? 2 )- Of the two, it is much easier to deal with X^. Indeed we have using Corollary Q. 

E\Xl 2 ) < c q n q p 2q (npq) q l 2 < (cn 3 p 5 q) q / 2 . (24) 

Let £i be the event: (recall, as always, c stands for poly(logn) and may have different values in 
different places) 

£i -\Yj\ < cnp for j < i 

ViS C [i — 1], with \S\ < cnp, we have Yjk < max(cn 2 p 3 , cnp) 

j,keS 
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Now, 

E\X 2 l ) = Yl E Y nkl Y nk2 E(Y in Y lkl -p 2 )(Y tJ2 Y lk2 -p 2 ). 

ji<ki<i ]2<k2<i 

Only terms where there are 2 or 3 distinct vertices among ji, j2> k\, ki contribute to the expectation. 
The number of terms with 2 distinct vertices (and thus only one edge in [i — 1]) is at most n 2 p under 
Si and EiYij^Y^ — p 2 ) 2 < p 2 , so the contribution of these terms is 0(n 2 p 3 ). If there are 3 distinct 
vertices, we have a path of length 2 in [i — 1]; there are n 2 p choices for the first edge of the path 
and np choices of second edge under £f, finally, we have \E{Yij^Yi kl — p 2 )(Yij 1 Y ik2 — p 2 )\ = 0(p 3 ); 
so the total of these terms is 0(n 3 p 5 ). Thus, we have 

n 

Further, under \Xii\ < a, where, a = max(cn 2 p 3 , cnp) so we have for any even i > 2, Erxli < 
Var/a'~ 2 /n. We note also that E l (X^ 2 ) < Var/a'~ 2 /n, since q < m < 0*(^/np) as is easy to see. 
Plugging these bounds into the "L terms" of theorem Q, we get 

™/ 2 i /v- r.._.\Vi _ 1 /v„w\ Vi 



^ Z 2 (, "I J < ° a E /2 ( fl 2 m J 



Since by the choice of m, we have ma 2 < Var/, the maximum of ^ 1 i s attained at I = 1. 
Also Ei(V^) G O(l). So, we have 

m/2 

< (mVar/)™/ 2 . (25) 




Now, we bound the M terms. Since the expected number of edges within a particular SC [i — 1] 
with IS" | < cnp is 0(n 2 p 3 ), the probability that there are more than max(cn 2 p 3 , cnp) edges is most 
e -cnp £ Qr a particular 5. Since there are at most np^p) S 's to consider, union bound gives us: 

Si = Pv^Si) < e~ cn P. 
We use a crude bound of \Xi\ < n 2 to get M^i < n 21 . So, 

m/2 



Z=l i=l J 



n m / 2l n 2m e ~ cn p/ m 



Again, it is easy to see that m < 0*(y/np); so the above is at most (cm Var/)" 1 / 2 . Together with 
the bound on the L— terms, we now have 

E(f-Efr<(cmV ar f) m / 2 , 

from which the tail bound follows by using Markov as before. 

Remark 9. It is easy to see that we do not have iV(0,Var/) tails beyond (np) 9 / 4 : just take a 
random G(n,p). Now add all the (np) 3 / 2 edges among the first (np) 3 / 4 vertices; the probability of all 
these edges being present is e~ c ( np ) 3 2 which is e ~* 2 / n3p3 ; where the deviation t from Ef is (rap) 9 / 4 , 
namely the triangles among the first (np) 3 / 4 vertices. 
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Remark 10. The inequalities in I33f and \43j bound tails of polynomial functions of independent 
variables; the papers give many applications of them. Since most of the situations considered here 
are not polynomial functions, these are not applicable. But number of triangles is a polynomial 
of degree 3 in the underlying variables Yij and so the main theorem of |^5| / (Theorem (4-2)) and 
Corollaries do apply. In that theorem, we have to choose k = 2 or 3 and it is easy to see that with 
the conditions, we only get a tail bound which falls as e - "' and not t 2 as required for sub-Gaussian 
bounds. 



13 Questions 

Many interesting open questions remain. Since the TSP is a classic problem, it would be interesting 
to strengthen/generalize results for the TSP. The first is to assume more limited independence: if 
one divides the unit square into I pieces which have Y\ , I2, ■ • • > Yi as the set of points inside each 
respectively, can we prove concentration when I G o(n) and E\Yi\ = n/l and assuming some moment 
conditions. Then, we have the question of extending concentration results under "bursts in space" 
to 3 and higher dimensions and finally, there are many other combinatorial problems [H] for which 
it would be interesting to prove such results. 

We have not dealt much with "bursts in time" , but the theorems here would seem to be applica- 
ble to such situations. In the bin-packing problem, it would be natural to assume that at each time 
i, one first picks the number of items which would arrive at that time and then have the items pick 
either adversarially or stochastically their sizes and prove concentration for the minimum number 
of bins. On-line versions of this problem are of interest. Queueing Theory has many examples of 
handling bursts and it remains to be seen how the results here may help in that area. 

The count of the number of not only triangles, but also other fixed graphs has been well-studied, 
but only for large deviations of the order of the expectation. It would be interesting to establish 
sub-Gaussian bounds as done here for triangles. This has some relation to the study of clustering 
coefficients and local communities in large (web-like) graphs. 



14 Comparisons with other inequalities 

The main purpose of this paper was to formulate and prove general probability inequalities which 
can be used to tackle the complicated combinatorial and other examples discussed. Here, we will 
compare our inequality to some others in the literature. For this we consider basic situations rather 
than complex ones to illustrate things better. 

The "sub-Gaussian" behavior - e - * 2 "" with the "correct" variance (for example in Theorem (|l| 
and Corollary d2J)) needs that the exponent of m in the upper bound in Theorem (jl| be y . Moment 
inequalities are of course well-studied and there are many sophisticated developments. One type 
of inequality is the Rosenthal type inequalities [16] which assert for Martingale difference sequence 
Xi,X%, . . . X n and even integer m: 

/ n \m / / \ m/2 

WX^) <f(m){EhTE(x!\X 1 ,X2,...X i - 1 )j +EmzxXF 

Here, f(m) has to be at least cm/ lnm as shown by a simple example of [30], which means that we 
cannot get sub-Gaussian bounds from these inequalities. The example is: The Xi are i.i.d. Bernoulli 
random variables with Xi = 1 — (1/n) with probability 1/n and — 1/n with probability 1 — (1/n) and 
n = cm/ lnm. For this, we have E(J™ =1 X t ) m > n m (l-(l/n)) m Pr(X; = (l-(l/n))Vi) > n m -< m \ 
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Our Theorem (JlJ) can tackle the example: Note that for I > \/ln m, we have {n/m)^~ l l\ > 1 
and since EX\ < 1/n, the hypothesis of our Theorem |l| is satisfied. For I < \f\nm, we see that 
(n/m)( l ^> 1 U > 1/n and this also suffices. So, our Theorem yields E(^2i^i) m < (nm)" 1 ' 2 . But 
the example proves that /(m) > (cm/ lnm) m . 

Another class of inequalities are the Burkholder |15| type inequalities which assert 

E(X 1+ X 2 + ... + X n ) m <g(m)E(X 2 + X 2 + ...+X 2 n ) ml \ 

for even integers m when Xi are Martingale differences. Here, since the right hand side involves 
taking the expectation of a power of the sum of n quantities, we only gain if we could argue (in 
essence) that not many of them can be simultaneously high. Indeed, if we do not have any such 
information, then the best we might say is X 2 + X 2 + . . . + X 2 < nmaxj |Aj| 2 , which only bounds 
the r.h.s. by g(m)ri m / 2 E m&Xi X™ and since it is known that g{m) has to be at least (cm)™/ 2 , 
this does not give as strong results as Theorem 0. [The fact that g(m) > (cm)™/ 2 follows from 
the simple example when Xi are i.i.d., each equal to ±1 with probability 1/2 each.] But, here is 
a simple natural example where Burkholder inequality provably cannot derive something as strong 
as Theorem 0: let Z\ be i.i.d., each Poisson with mean 1 and let Xj = ±Zi,i = 1,2, ...n, with 
probability 1/2 each, so EXi = 0. It is well-known that for even I, EX\ = EZ\ = (cl) 1 , where c 
here (and the rest of this section) involves constant and logarithmic (in I) factors. Theorem 
directly yields N(0, n) tails for X = J2?=i Xi upto n. But to apply Burkholder, we must deal with 
E(Y^i X 2 ) m l 2 for even m. We have 

E fe^) - (m/2) ( m /2)K^i 2 ) m/2 + nEX? > {cn) m l 2 + (cm) m . 

So, the best one can ever prove is EX m < [cnm) m l 2 + (cm) 3 " 1 / 2 . Consider a tail probability 
P r (|^| > t); the best we could get for this from Burkholder type inequalities is 

„ vl . (era) 3 ™/ 2 {cnm) m l 2 
Pr ( X >t)< + ^ . 

The minimum value of (cm ) 3 ™/ 2 /f m is easily seen by Calculus to be e c * 2 3 and when n 3//4 G o(t), 
we have t 2 / 3 G o(t 2 /n), so we do not get iV(0, n) tails beyond n 3//4 . One can ask if this is a cooked up 
example. But it occurs naturally - in many geometric probability results for example, where, n i.i.d. 
points are picked uniformly from the unit square, it turns out that the "Poisson approximation" 
where instead one runs a Poisson process of intensity n to get the points is more useful since, then, 
points in non-intersecting regions of the square are independent pE]. In this process, clearly the 
number of points in any region of area 1/n is Poisson with mean 1 and indeed, in our TSP and 
minimum weight spanning tree analysis, we used a generalization of this, allowing longer tails and 
dependence for the generation process and were still able to use Theorem Q. 

The author has received many queries about how particular inequalities (the literature is clearly 
rich in this area with a number of clever papers, a majority appearing in the venerable journal: An- 
nals of Probability) compares to the theorems here. An exhaustive comparison with each inequality 
in the literature would not be possible. But some more comparisons are given here. We consider 
three particular corollaries of our theorems - Generalized Chernoff bounds (GC) (Corollary ([2]), 
Remarks ([4] and [8])), H-A and the Poisson example above. Our theorems can derive tail bounds for 
all of these. 

A recent result on the line of Burkholder inequalities is for example, one in (36], which asserts 
that 

E[X X + X 2 + ... + X n ) m < (cmn) m / 2 EXr, 
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where m is again even and Xi are now stationary Martingale differences. The m m / 2 is promising 
for getting sub-Gaussian bounds, but the high moment £1™ on the right hand side means that 
Chernoff bounds don't follow from this. On the other hand, for stationary martingale differences, 
this is a strengthening of H-A. Talagrand's inequality can of course derive Chernoff bounds, but 
it only applies to independent random variables and so cannot derive H-A or GC. There are also 
inequalities based on the beautiful technique of Decoupling, for example Theorems 1.2A to 1.5B 
of [18]. This works only for Martingale differences, requires bounds similar to our theorem Q, 
but for all moments, not just up to an m precluding our Corollary ([2j and all other applications 
assuming only finite moments. But, we note that this does tackle the Poisson example and indeed, 
our theorem is close in spirit to this, as discussed in Remark Q. [Needless to add all the 
inequalities mentioned have their virtues which for want of space, we do not describe.] Here is a 
little table summarizing these comparisons. 





Poiss 


H-A 


GC 


THM 1 


Yes 


Yes 


Yes 


Rosent 


X 


X 


X 


Burkh 


X 


Yes 


Yes 


Decoup 


Yes 


Yes 


X 


Talag 


??? 


X 


X 



Legend: Poiss - the Poisson example above. GC - Generalized Chernoff. 

Our crucial advantage is that while earlier moment inequalities generally do not focus on differ- 
entiating between the coefficients of different moments, the current paper pays particular attention 
to the terms involving different moments. We are able to get a smaller coefficient on the higher 
moments which thus matter less; this is helpful, since lower moments are easier to bound tightly. 
This enables us to get the sub-Gaussian tails in the combinatorial situations discussed, whereas 
traditional inequalities do not get such bounds. It is worth noting that if we settle for an extra 
factor of m m / 2 in the bounds of our Theorem Q (thus abandoning correct Gaussian tails) and also 
restrict only to Martingale differences instead of 0, then Burkholder's inequality would imply the 
theorem. 

Another family of inequalities are the Efron-Stein type inequalities. A recent result of Boucheron, 
Bousquet, Lugosi and Massart [13] proves concentration for a real- valued function F of inde- 
pendent random variables Y\, I2, . . . Y n . Let Z = F(Yi, Y2, . . . Y n ) and suppose functions Zi = 
Zi(Yi, Y2, . . . Yi—i,Yi+i, . . .Y n ) are arbitrary functions. Their main theorem is that 



E {{Z - EZ) + ) m < (cm) m ' 2 E ( J^(Z - Ztf 



m/2 



(26) 



[In the setting of independent random variables, this is in a way similar to Burkholder.] 

Here, again, we sum up the variations in Z caused by all the n variables and then take a high 
moment of it. The advantage of this would be in situations where one can show that not too 
many of individual Yj cause large changes for typical Y±, Y2, . . . Y n . [See |13j.] This general line of 
approach is also reminiscent of Talagrand's inequality; but Talagrand allows simultaneous change of 



variables. Note that (26) has an exponent of m/2 on the m which can lead to the ideal sub-Gaussian 
behavior. In contrast, our inequality (like Rosenthal's) only considers variations of one individual 
variable at a time which is in many cases easier to bound. We saw this in the case of Bin-Packing, 
coloring and other examples. Even for the classical Longest Increasing Subsequence (LIS) problem, 
where for example, Talagrand's crucial argument is that only a small number 0{y/n) of elements 
(namely those in the current LIS) cause a decrease in the length of the LIS by their deletion, we 
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are able to bound individual variations (in essence arguing that EACH variable has roughly only a 
0(l/y/n) probability of changing the length of the LIS) sufficiently to get a concentration result. 
Note that if one can only handle individual variations, then (26) again essentially yields only 



E {{Z — EZ) + ) m < {cmn) m l 2 max E(Z - Zi) m . 

i 

In this case, arguments as in Theorem ([!]) as well as what we do for Bin-Packing and LIS which is 
based mainly on the second moment, do not work, since the above involves a high moment. There 
are many other specialized ingenious probability inequalities in the literature; we have only touched 
upon general ones. 

Besides the situation like JL theorem, the Strong Negative correlation condition is also satisfied 
by the so-called "negatively associated" random variables ( [29] , [2T] , [Jl] for example) . Variables in 
occupancy (balls and bins) problems, 0-1 variables produced by a randomized rounding algorithm 
of Srinivasan [JD] etc. are negatively associated. 

Acknowledgements Thanks to David Aldous, Alesandro Arlotto, Alan Frieze, Svante Janson, 
Manjunath Krishnapur, Claire Mathieu, Assaf Naor, Yuval Peres and Mike Steele, for helpful 
discussions. 
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